TRANSACTIONS ON BIG DATA 1 A Distributed

نویسندگان

Yu Chan

Andy Wellings

Ian Gray

Neil Audsley

چکیده

Java 8 has introduced new capabilities such as lambda expressions and streams which simplify data-parallel computing. However, as a base language for Big Data systems, it still lacks a number of important capabilities such as processing very large datasets and distributing the computation over multiple machines. This paper gives an overview of the Java 8 Streams API and proposes extensions to allow its use in Big Data systems. It also shows how the API can be used to implement a range of standard Big Data paradigms. Finally, it compares performance with that of Hadoop and Spark. Despite being a proof-of-concept implementation, results indicate that it is a lightweight and efficient framework, comparable in performance to Hadoop and Spark, and is up to 5 times faster for the largest input sizes tested.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimization of majority protocol for controlling transactions concurrency in distributed databases by multi-agent systems

In this paper, we propose a new concurrency control algorithm based on multi-agent systems which is an extension of majority protocol. Then, we suggest a clustering approach to get better results in reliability, decreasing message passing and algorithm’s runtime. Here, we consider n different transactions working on non-conflict data items. Considering execution efficiency of some different...

متن کامل

Opportunities in Big Data Management and Processing

Every day we witness new forms of data in various formats. Some example include structured data from transactions we make, unstructured data as text communications of different kinds, varieties of multimedia files and video streams. To ensure efficient processing of this data, often called ‘Big Data’, the use of highly distributed and scalable systems and new data management architectures, e.g....

متن کامل

Parallel Rule Mining with Dynamic Data Distribution under Heterogeneous Cluster Environment

Big data mining methods supports knowledge discovery on high scalable, high volume and high velocity data elements. The cloud computing environment provides computational and storage resources for the big data mining process. Hadoop is a widely used parallel and distributed computing platform for big data analysis and manages the homogeneous and heterogeneous computing models. The MapReduce fra...

متن کامل

Towards the End-to-End Design for Big Data Management in the Cloud: Why, How, and When?

With the wide-scale adoption of cloud computing and with the explosion in the number of distributed applications and end-user devices, we are witnessing insatiable desire to build bigger-and-bigger systems that can serve hundreds of millions of end-users, are highly automated, and can collect enormous amounts of data in short periods of time. Often newer systems are implemented by integrating e...

متن کامل

Title : IEEE Transactions on Cloud Computing Title of Paper : Cross - cloud MapReduce for Big Data

MapReduce plays a critical role as a leading framework for big data analytics. In this paper, we consider a geodistributed cloud architecture that provides MapReduce services based on the big data collected from end users all over the world. Existing work handles MapReduce jobs by a traditional computation-centric approach that all input data distributed in multiple clouds are aggregated to a v...

متن کامل

Evolving Databases for New-Gen Big Data Applications

The rising popularity of large-scale real-time analytics applications (real-time inventory/pricing, mobile apps that give you suggestions, fraud detection, risk analysis, etc.) emphasize the need for distributed data management systems that can handle fast transactions and analytics concurrently. Efficient processing of transactional and analytical requests, however, require different optimizat...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

TRANSACTIONS ON BIG DATA 1 A Distributed

نویسندگان

چکیده

منابع مشابه

Optimization of majority protocol for controlling transactions concurrency in distributed databases by multi-agent systems

Opportunities in Big Data Management and Processing

Parallel Rule Mining with Dynamic Data Distribution under Heterogeneous Cluster Environment

Towards the End-to-End Design for Big Data Management in the Cloud: Why, How, and When?

Title : IEEE Transactions on Cloud Computing Title of Paper : Cross - cloud MapReduce for Big Data

Evolving Databases for New-Gen Big Data Applications

عنوان ژورنال:

اشتراک گذاری